A variational approach to the stochastic aspects of cellular signal transduction 
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Cellular signaling networks have evolved to cope with intrinsic fluctuations, coming from the small numbers 

of constituents, and the environmental noise. Stochastic chemical kinetics equations govern the way biochemical 

networks process noisy signals. The essential difficulty associated with the master equation approach to solving 

the stochastic chemical kinetics problem is the enormous number of ordinary differential equations involved. 

In this work, we show how to achieve tremendous reduction in the dimensionality of specific reaction cascade 

dynamics by solving variationally an equivalent quantum field theoretic formulation of stochastic chemical 

, kinetics. The present formulation avoids cumbersome commutator computations in the derivation of evolution 

equations, making more transparent the physical significance of the variational method. We propose novel 

time-dependent basis functions which work well over a wide range of rate parameters. We apply the new basis 

functions to describe stochastic signaling in several enzymatic cascades and compare the results so obtained 

with those from alternative solution techniques. The variational ansatz gives probability distributions that agree 

well with the exact ones, even when fluctuations are large and discreteness and nonlinearity are important. 

• ' A numerical implementation of our technique is many orders of magnitude more efficient computationally 

O , compared with the traditional Monte Carlo simulation algorithms or the Langevin simulations. 

• i-H 

, Keywords: Stochastic Processes, Nonlinear Chemical Kinetics, Variational Approach, Quantum Field Theory, Signal Trans- 

duction. Discrete Noise, Strong Fluctuations, Master Equation 
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l/-^ ■ I. INTRODUCTION 



The life of the cell is regulated by intricate chains of chemical reactionsl. The whole cell may be viewed as a computing 
device where information is received, relayed and processed^ Signal transduction cascades based on protein interactions reg- 
\^ ' ulate cell movement, metabolism and divisioni*^. Since cells are mesoscopic objects, understanding the role of the intrinsic 
. fluctuations of the biochemical reactions as well as environmental fluctuations is a fundamental part of understanding signaling 
■ (jyjj^jjjjQgi5j6,7i8j9ji0jiiji2j3j4^ jj^j^ regard, the well-organized behavior of cells, which emerges as a result of biochemical 
reaction dynamics involving hundreds of cross-linked signaling pathways, is remarkablai^iiii^ii2i22iSii2». The problem of how 
' signals can be precisely detected, smoothly transduced and reliably processed under noisy conditions is a research topic of great 
qh' current interest, that, in turn, should lead to deeper understanding of the origins of the cell's functional responses2i2i. Further- 
more, these studies can help to unravel the design principles for various signaling pathways, leading, eventually, to better ways 
to control and efficiently interfere with cellular activity, as would be needed to correct the behavior of diseased cellsi^i^^. 

The role of noise in gene regulatory networks has been identified as a key issue and has been intensively studied in recent 
^ ' yg^ji0jiiji2,26,27j28j29,30^ Linearization of the noise may be acceptable if the dynamics near steady states is being studied'"--*-U. 
9^ 1 When protein numbers are large and, thus, the continuous approximation is valid, time-dependent distributions have been de- 
termined using the Langevin or Fokker- Planck equations ^f^^iS. To account for the discreteness in the linearized equations, the 
generating function approach has also been usedi2i2&. A variational treatment of steady state stability and switching in nonlinear, 
discrete gene regulatory processes has been reported22i^. 

In cytosolic signal transduction processes, in contrast to gene transcription which involves a unique DNA molecule, all the 
reacting species are present in multiple copies and participate in unary, binary or perhaps even higher order reactions. Noise 
could be multiplicativ e^' '^1-^^ and the linear description easily breaks down. Moreover, cellular reactions usually take place hetero- 
geneously in space The localization and compartmentalization of protein organelles require diffusive or active transportation of 
reacting molecules from one region to another. Spatial coordination combined with temporal coordination generates coherent, 
yet complex spatiotemporal pattern9i^ii2i22iSiiaa^^i2i^. 
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The extracellular ligands often trigger cascades of chemical reactions which propagate inside a cell and induce responses from 
various environmental cues. The cell body is a highly heterogeneous entity and never settles to a steady state. To understand 
cell dynamical processes, an explicitly time-dependent description is required. Within a volume with linear dimensions of the 
Kuramoto lengt hWi! , diffusion mixes the reagents in a nearly uniform manner. If the reactions are considered in the Kuramoto 
volume, it is reasonable to neglect the spatial heterogeneity. For many signal transduction networks, however, it is likely that 
only a few proteins are present in the Kuramoto volume (determined by specific reaction and diffusion rates), and, therefore, the 
continuous description of protein numbers breaks down. To characterize stochastic signaling reactions in this volume, a time- 
dependent description of a noisy, discrete, nonlinear system is required^. In many situations, such as Drosophila oogenesis, 
the exact shape of the probability distribution profile is very important and determines different developmental paths^- "*'. In the 
following, we discuss efficient techniques to compute the time-dependent protein number probability distributions in biochemical 
reaction networks when the number of protein copies is small. 

The Gillespie algorithm provides an effective Monte Carlo technique for simulating stochastic chemical reactions^Siiiil. Each 
simulation gives a reaction trajectory which is close to the deterministic trajectory in the large particle number limit. To get well- 
converged statistics, many trajectories may be needed, often on the order of 10^. If there is a separation of time scales of the 
constituent chemical reactions, Gillespie simulations also become exceedingly slow since the reaction events are dominated by 
the fastest reactions while the observables typically involve the slowest reactions. Although considerable progress has been made 
in accelerating such simulations^i^SiSi^^, computational inefficiency continues to be an impediment, especially for the spatially 
inhomogeneous generalization of the Gillespie algorithm. Furthermore, it is hard to extract the analytical structure of the solution 
from the numerical results, which can be important for achieving a deeper physical understanding of the system behavior when 
the rate parameters are widely varied. 

Mathematically, a stochastic process may be completely characterized by a master equation - a group of ordinary differential 
equations (ODEs) describing the evolution of probabilities^i^^. The main difficulty in solving a chemical master equation 
is the enormous number of ODEs involved even for a small reaction cascade. A number of analytical techniques have been 
developed for solving approximately the master equatiorkiSi2^i2^. In this work, we show how to achieve enormous reduction 
in the dimensionality of specific reaction cascade dynamics by solving variationally the quantum field theory (QFT) equations 
of stochastic chemical kinetics^i^M^iSli^. Our present approach is based on mapping the master equation ODEs into a single 
partial differential equation (PDE) and applying a variational technique which reduces the PDE into a small number of ODEs. 
The variational QFT approach has been employed to study steady state stability and switching in gene regulatory networks22i^. 
In this work we propose novel time-dependent basis functions appropriate for describing protein signaling cascades which work 
in a wide range of rate parameters. Our method gives probability distributions that agree well with the exact ones, including 
when the fluctuations are large and discreteness and nonlinearity play important roles. 

The paper is organized as follows. In section|ll] the QFT formulation of the stochastic processes describing chemical reac- 
^jQjjg29j30j47j50j5ij52 Eyink's variational solution technique for solving such field theories^^i^ are presented. We show that the 
QFT formulation is equivalent to a generating function approach and also discuss the physical significance of the variational 
principle in this context. In section Ell] we apply the new trial functions and the variational technique to a number of 2-step, 
3-step, and 4-step enzymatic reaction cascades and compare our results with those found with more traditional methods. Finally, 
we discuss the more general principles of basis function construction and the limitations of variational approaches. 

II. QUANTUM FIELD THEORY FORMULATION, VARIATIONAL PRINCIPLE AND GENERATING FUNCTIONS 

In this section, we first discuss briefly the master equation and demonstrate its application to a 2-step signal amplification 
cascade. Next, the master equation is recast into a QFT form in which the probability evolution is governed by a "wave equation". 
Then, we show that the field theoretic formulation is equivalent to a generating function approach. To solve these equations, 
Eyink's variational technique and its physical significance are examined. We further explore the practical implementation issues 
in sectionlVl 

A. The master equation and its solution 

Unlike a stochastic simulation which produces an individual random trajectory and generates statistics only after a large 
number of samplings, master equations directly describe the evolution of probability distributions in the state space of a system 
based on specific inter-state transition rates. For a discrete system, the master equation consists of a set of ODEs (see the 
following examples), while for a continuous system it becomes an integro-differential equation such as the Boltzmann equation. 
Although the master equation is a complete description of a Markovian system, its solution is usually difficult and requires 
special techniques. This paper presents one variational technique that could be used. 

As an example, let's consider the following set of equations that represents the simplest enzymatic signal amplification process 
(Fig. 0. In this simple reaction scheme, without feedback loops, R represents an inactive receptor, which becomes activated 
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into R* upon binding of an external ligand (stimulus). When the receptor is activated, it acts as an enzyme, catalyzing the 
phosphorylation of the next kinase downstream {A + R ^ A* + R*) with a rate fi. A* spontaneously decays to A with a rate A 
and R* to R with a rate k. In the absence of R*, A ^ A* may occur naturally, however, with a much lower rate, so that it can 
be ignored when we introduce the catalyst R*. 



C ^ } ^ hormone^ ^ 

FIG. 1: An inactive receptor R, when activated by a signal, activates downstream protein A. 

Although the R* reaction is unary and independent of the A reaction, the latter one is binary, making the system nonlinear, 
thus, different from those considered in a number of prior works on the gene regulatory networksiSiiiiS^iS. To write the master 
equation, we denote by P(m, n) the probability of having m i?*'s and n A's, then 

dP 

-^{m,n) = fi[—mnP{m,n) + m{n + l)P{m,n + l)] 

+ X[-{N - n)p{m, n) + {N -n + l)P(m, n - 1)] 

+ g[-P{m,ji) + P{m-l,n)] + k[-mP{m,n) + {77i + l)P{m+l,n)], (1) 

where N is the total number of A and A* . In Eq. Q, the first two terms describe the A — A* reaction and the rest the R — R* 
reaction. This simple 2-step cascade is commonly found embedded in the onset of a reaction pathway of many important 
signaling cascades^^f^. If a large number of inactive receptors, R, are present the rate of conversion depends on the arrival times 
of the external cue and the reaction becomes Poissonian. We assume that this is the case in all the following calculations. If the 
R ^ R* reaction is the usual birth-death problem, our formalism still applies with only minor changes. The master equation Q 
actually contains infinitely many coupled ODEs. 



* decay 

R ^— ►R 

* decay 



B. The QFT formulation 

The differential-difference equations, such as Eq. Q, are well represented in the QFT formulation by introducing creation 
and annihilation operators a, and states 1 7T,v29j30j47j48j49j50^ analogy to quantum mechanics, the operators satisfy the com- 
mutation relation that 

[a, a^] — 1 . 

As usual, the "vacuum state" |0) and its conjugate (0| are defined to satisfy 

(0|at = a|0) = 0, (0|0) = 1. 
Other states are built up from the vacuum state, such as the n-particle state \n) 

|n)=at"|0). 
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It is easy to check with the help of the commutation relation 

a\n) ~ n\n — 1) , a^\n) = |n + 1) , a'aln) = n\n) . 
Hence, a^a is the "particle number operator". Notice that the states |n) are not normalized in the usual sense since 

(n\n) = (0|a"|n) = n! , 

but they are orthogonal, 

(m|?i) = (0|a"|?i) = 0, for 771 7^ n. 
The state that corresponds to a probability distribution P{n) is 

|*)=^P(n)|n). 

n 

The probabilities are thus encoded into the coefficients of different particle number states superimposed into the "wave function" 
1 5'). In order to compute physical observables, the harvesting state = (Oje" is introduced. It is easy to check that 

(0|n> = l, {m = l, (0|(atar|vI/) = (O. 

The first equation shows the particular normalization of an n-particle state. The second equation corresponds to the probability 
conservation J^n Pip) = 1- The third equation may be used to calculate the m-th moment of the particle number. The evolution 
of probabilities is governed by a wave equation for 4': 

^ = ^^l*> . (2) 

at 

The original large sets of ODEs are now compactified into just one equation. Applied to the 2-step cascade (Fig. 0, Eq. (|2ji is 
characterized by following operator fi, 

n = {l- a)){^ib'^ba - A7V + \a)a) + g{h^ - 1) + k{b - b^b) , (3) 

where 6^ , b are the creation and annihilation operators associated with R* and at , a with A. In this case, 

I'l') = P{m, n)\m, n) , 

where 

|m,n) =at"&t"|o). 

Eq. (|3} is readily verified by substituting into Eq. (|2ji and comparing the coefficients of each (m, n)-particle state. In contrast to 
ordinary quantum mechanics, the operator is non-Hermitian, so the inner products between the states are not conserved. This 
was the reason to introduce earlier the harvesting state. Nevertheless, many QFT techniques may be profitably applied, albeit 
with some modifications-''-^^" '***-^*'-^^-^**-^''. We will not discuss those and instead will translate the above field theoretic formulation 
to the familiar differential equation language. 



C. Differential operators and the generating function 

In the field theoretic form, the computations are carried out by commutator manipulations that sometimes are awkward. 
Fortunately, it turns out that we may convert the operator equation (|2} into a partial differential equation (PDF). To accomplish 
that, we explore the analogy between a , and d/dx , x. Not only do they have the same commutator 

[a, a)] = 1 [-7-, 2;] = 1 , 



but, a more comprehensive correspondence is found: 



dx 



aat |0 >= |0) -!Lx^l- 
dx 

a|0)=0-^ -7-1 ==0; 



dx 



aat" |0) =nat"-i|0) 4=> —x" 

ax 



a^at'MO) = — aT"-™|0 >^ ( — )"a;" = — a;""" , forn > m. 
m! ax ml 



4 



Y. Lan, P.G. Wolynes, & G. A. Papoian 



A variational approach to stochastic signal transduction 



From these, we can also deduce for any smooth function / that 

a'"/(at)|0)^(|-r/(x). 

The analogy j^") = ^'('^)a^"|0) <;==^ 'i'(x) = P{n)x^ converts a wavefunction to a generating function. The inner 
product with the harvesting state corresponds to evaluation at .t = 1. It is easy to check the following relations, 

< 0|e"|* > = ^-(1); 
<0|eV(«^)|*> = /(1)*(1), 
as e'^/''^/(x) = f{x + l). 

The wave equation (|2j becomes then a PDE for the generating function after all the necessary conversions are done. For the 
2-step cascade, this expression is 

— = {l-x)ifiy-—-XN + Xx—)^+giy-l)^-k{y-l)—, (4) 
ot oxoy ox ay 

where the first term describes the A — A* reaction and the rest the R~R* reaction. Generating functions were previously used to 
treat unary reactions in the gene regulatory networkiSiffi. The second order derivative term d^'i'/dxdy in Eq.0 is characteristic 
of a binary reaction, indicative of nonlinear kinetics. This higher derivative changes the order of the PDE and adds significant 
difficulty to solving Eq.@. In the generating function formalism, the equation \I'(1) = 1 encodes the conservation of probability 
and the first two moments are given by 

a* 

in) = ^U=i, 

where j^^^i means evaluation at a; = 1. Therefore, the moments are obtained when the generating function is expanded at x = 1, 
while the probability distribution is obtained from the Taylor coefficients when the generating function ^ is expanded at x = 0. 



D. The variational solution 



In the QFT formulation of the stochastic processes, a variational principle may be derived which is equivalent to the evolution 
equation This principle indicates that the physical solution of Eq.(|2|i is given by the stationary points of the following 
functional22i^2iS 

POO 

ff[$L , = / dt{^L\dt - mii) , (6) 
Jo 

where $l and are arbitrary quantum states under quite general constraints consistent with the positivity of probabilities and 
the fixed boundary conditions. In practice, we take a finite-dimensional subset of the infinite-dimensional function space and 
apply the variational principle in this subspace to get closed equations that may be subsequently solved by simple numerical 
calculation. If the essential qualitative properties of the system are known, good approximations of the original problem can be 
achieved through an informed choice of time-dependent basis functions that define the relevant subset in the function space. 

Because ft is not Hermitian, the right and left eigenvectors are different. To characterize the system, we, therefore, need two 
sets of vectors $l and The stationary variation condition for restores the original equation (|2ji and that for defines 
an equation satisfied by $l. If we view the operator 9t — f2 as a large matrix parameterized by t, the $l and $ fi generated by the 
stationary variation condition correspond to its singular vectors^i*^ and the extremum values of Eq. ^ are the singular values. 
Physically, from the Schrodinger picture point of view, is the evolving quantum state and $l represents the measurable 
quantities in which we are interested. Eq. ^ serves to find the most significant state and physical observables. Eyink originally 
applied this variational principle to Fokker-Planck equations^. Subsequently, Sasai and Wolynes used this variational approach 
in the field theoretic form and obtained moments in a toggle-switch gene regulatory problemSi In this paper, we show how the 
variational principle may be applied, instead, to the generating functions. We introduce novel basis functions to obtain the time- 
dependent probabihty distributions in signal transduction cascades. Another novelty of the present formulation is our avoidance 
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of cumbersome commutator computations in the derivation of the evolution equations, making more transparent the physical 
significance of the variational method. 

There are many ways to choose the time-dependent basis functions. We follow the approach of Sasai and Wolynes^S; 

I'fR) = $«(at,{/,(i)}^^0|0) (7) 
= (0|exp(a)(l + ^c,(t)a), (8) 

i=l 

where n is the number of unknown functions in \^b) and m is the number of parameters in Eq. (|8}. The exponential factor in 
\ acting on (0| gives the harvesting state. If we substitute this "ansatz" into Eq.(|6} and carry out the variations with respect 
to Ci, a finite set of ODEs for the evolution of {fi{t)} are obtained, which then determines the evolution of the probability 
distribution. 

As mentioned, the variational method can also be recast into the generating function language using the conversion scheme 
discussed previously. Now $i becomes a differential operator and $fl a guess function of variable x. For example, Eq. ( I7I8> 
corresponds to 



)- 

dx 



1=1 



iJ= / dt^L{dt~n)^R\.=i, (10) 



The functional H simply becomes 



In the new picture, we have much simpler mathematical operations, e.g., the variational principle becomes simply a function 
extremization condition 

5H , 

^^l{c,(t)=o},<„,x=i = 0, fori== l,2,...,m. (11) 

Or equivalently 

=0,, fori = l,2,...,m. (12) 

The evolution of the generating function should always conserve the total probability. As in Eq. 0, the total probability ^'(1,1) 
does not change with time. The proper choice of should also guarantee this invariance, satisfying {dt — Q)^ii\x=i ~ 0. 
Now, Eq. M2\ tells us that the higher derivatives of the expression [dt — ^)^r evaluated at a; = 1 are also zero. Therefore, 
in the limit of to ^ oo, Eq. (I12> leads to the PDE dt^ r — r- For finite to, this PDE is approximately satisfied in the 
neighborhood of x = 1. 



III. NUMERICAL APPLICATIONS 



In this part, we discuss the implementation of the PDE version of the variational method and apply it to several simple, yet 
important enzymatic cascades. Before proceeding to the individual examples, we emphasize our motivation for selecting the 
time-dependent basis functions. We also briefly discuss several alternative methods also used to solve the master equation. 



A. Computational details 

It is reasonable to require the following constraints on the right basis function $ ^^ (.t, y): 

(1) the total probability should be equal to 1, i.e., <I>_r(1, 1) = 1; 

(2) the probability should be positive, i.e., the coefficients of the Taylor expansion of around [x^y) ~ (0,0) should be 
nonnegative; 

(3) the time rate of the unknown functions, fi{t), should be obtainable by solving Eq. (I12> derived from the variational principle. 
In the following, we introduce two sets of basis functions. One set is simple but is of limited applicability, while the other is in 
a more complex integral form and can be applied very generally. 
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We use simple left basis functions, <i>i (x, y). As suggested in Eq. ( II U . they are represented by differential operators and can 
be easily extended to multi-variate cases. For the 2-step cascade, we use 

^L,i^l + cA+C2dy (13) 

with a simple right basis function (see Eq. M%\ below) and 

<^La^^LA+c3.dyy (14) 

with a more complicated right basis (see Eq. M\\ below). For the 3-step cascade discussed below, we use 

^L.S = *L,2 + C4a. + Cs^^^ . (15) 

Following a similar pattern, we can simply write the left basis function for the 4-step cascade discussed below, as, 

^LA = *L,3 + Cad^, + CjOn^iu ■ (16) 

In all above equations, , d^x denote the first and second derivative with respect to x, and so on. Other choices of the left basis 
function are, of course, possible. The current choice is simple and gave the best results among different basis functions which 
we tried. "Maple" symbolic software was used to derive the time evolution equations and, subsequently, "Matlab" numerical 
software was used to carry out the evolution of equations of motion and for plotting the computation results. 

To validate our calculations, we used the Gillespie simulation^ilSiiiii results as the reference (exact) solutions. 10^ stochastic 
trajectories were sampled to derive the distributions and other statistical quantities. At the same time, the variational method was 
also compared with two commonly used methods - Langevin equation^! and ri-expansion^4Si^iife. In the Langevin equation 
simulation, we also used 10^ realizations. In addition, to prevent the appearance of negative particle numbers, we applied 
the selection procedure commonly used^. It is awkward and time-consuming to use the fi-expansion method to compute the 
distributions. We only applied it to the simplest 2-step cascade. 



B. Application to a two-step amplification cascade 

In a previous papes^S, approximate analytical solutions for @ in certain parameter range were obtained using the method of 
characteristics. If the initial conditions correspond to zero i?*'s and N ^'s, then a generating function solution reads, 

= + { \ + lm{t) + ATi^'~*'^'""'*"0 " ^^^"^ exp(m(t)(a: - 1)) , (17) 

where m{t) ~ |(1 — e^'^*) is the average number of R* at time t. We make use of this specific functional form and try the 
following ansatz, 

$i^-(l + /2(^)(2/-l))'^e(--^)^^(*^ (18) 

which results in the following 2D ODEs 

A = 5 - fc/i 

h = A(l - /2) - ///1./2 . (19) 

These equations have a particularly simple physical explanation - they correspond to the deterministic chemical kinetics equa- 
tions since /i and N * /2 are equal to the average numbers of R* and A, respectively. But now, we may obtain probability 
distributions through Eq. jl8l l. For example, the variance of A can be easily calculated as cr^ ~ {n?) — (n)'^ ~ f2{t) — fiW- 

These ODEs can be solved exactly and we show in Fig. |2]the probability distribution of A* att = 30 for two sets of parameter 
values. Also shown in the figure are results obtained from calculations using more traditional techniques. The first set of 
reaction rate parameters were chosen as g = 10 , /c = 5 , /i = 0.02 , A = 0.15, with the initial conditions {Nji, Nj^-^ , Na, Na* ) = 
(20, 0, 5, 0). Since the R—R* reaction is much faster than the A— A* reaction, one expects Eq. jl7> to be a good approximationiii 
Indeed, in Fig. |2ja), the variational ansatz Eq. ( II8I 1 leads to a result that overlaps significantly better with the exact Gillespie 
calculation, compared with the results from fi-expansion and Langevin equation . The 0-expansion result turns out to be more 
concentrated than the exact result, while the Langevin equation does not work well near the left boundary, shifting the average 
to the right. 

For other parameter values, as long as the R—R* reaction is fast, the ansatz Eq. ( I18> works fine as expected^S^. However, 
if the first reaction is considerably slower than the second one, this ansatz becomes less useful, as shown in Fig. |2jb) for 
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FIG. 2: Comparison of the computed distributions for Na' at f = 30 for the 2-step cascade: Gillespie simulation (circles), one-term basis 
(dashdotted line), integral form basis (solid line), fi-expansion (dotted line), Langevin equation (dashed line), (a.) g — 10 , k — 5 , fi — 
0.02, A = 0.15 with initial condition {Nr, Nr- , Na, Na') = (20,0,5,0) and (b) g = 0.2 , fc = 0.1, p = 0.02 , A = 0.15, with 

{Nr,Nr.,Na,Na'} = (20,0,20, 0). 



g = 0.2 , A: = 0.1 , /I = 0.02 , A = 0.15, with {Nr, Nr, , Na, Na' ) = (20, 0, 20, 0). The variational result gives a too narrow 
distribution. The Langevin equation is still not accurate on the left boundary, the average being shifted to the right. 

In general, the ansatz ( I18> tends to generate a distribution narrower than the exact one, which is also shown in Fig. lUb). This 
can be explained as follows. The ansatz ( I18> is a product of functions of x and y and hence only the average particle number /i 
appears in the second equation of il9i . Therefore, the fluctuation generated in the R — R* reaction is absent in the treatment of 
the A — A* reaction. Physically, if the first reaction is fast, then the second reaction only "sees" an average number of R*, with 
its fluctuation averaged out, and the ansatz ( I18> produces accurate results (Fig. |2ja)). If the first reaction is slow, however, then 
the fluctuations in the number of R* strongly influences the A~ A* reaction and the mere average /i is not capable of passing 
this information. The distribution computed from ansatz (I18> only accounts for the internal fluctuation of A — A* reaction and 
hence has a narrower profile than the exact result. On the other hand, despite the apparent simplicity, this ansatz allows one 
to estimate fluctuations in a reaction network in a semiquantitative way, with an extremely low computational cost, similar to 
solving the ordinary deterministic kinetics equations. 

It is straightforward to generalize ansatz (I18> to longer cascades. For example, for the 3-step cascades considered next, we 
may write the right ansatz as 

$7? = (1 + hm^ - i))''Hi + /2(t)(y - i))^e-^^ . (20) 

The resulting ODEs for fi (t ) 's are similar to Eq. ( I19> and have a physical interpretation related to the chemical kinetics equations, 
as discussed above. 

To get more accurate result, we have to convolute the number fluctuation of R* with the number fluctuation of A. Since the 
ansatz based on simple separation of variables does not work, we need an equation in which cc , ?/ are explicitly entangled. To 
facilitate the computation, we use the following integral form representation: 



ds-^(l + /2(i)e-(^-^=^(*»-(y-l) + /i(t)(x-l)) , (21) 

where fi{t) is related to the R — R* reaction and f2{t) , /^{t) are related to the A — A* reaction. Note that $^(1, 1) = 1. For 
y = 1, we get the expected generating function for R* 

<f>fl(x, 1) = (1 + /i(t)(x - 1))^ « e"^^(*)(--i) , (22) 
the above approximation being valid when /i (t) is small, which is true in all simulations below. We could have used 

N 



^ (l + /2(t)e-(^-/^(*»'(y - 1)) cxp((s - m)\x ~ 1)) 
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(a) P{Na' ) at i = 6 (b) PiNA' ) at t = 30 



FIG. 3: Comparison of the computed distributions for Na* at t = 6 and t = 30 for the 2-step cascade: Gillespie simulation (circles), integral 
form basis (solid line), $l-expansion (dotted line) and Langevin equation (dashed line). g = 2,k = l,ii = 0.02 , A = 0.15 with initial 
condition {Nr, Nr. ,Na,Na') = (20, 0, 100, 0). 




10 20 30 10 20 30 

t t 

(a) The average (b) The variance 

FIG. 4: Comparison of the average and variance computed for the 2-step cascade in time interval t = [0, 30]: Gillespie simulation 
(circles), one-term basis (dashdotted line), integral form basis (solid line), fi-expansion (dotted line) and Langevin equation (dashed line). 
g = 2,fc = l,^ = 0.02 , A = 0.15 with initial condition {Nr, Nr* , Na,Na' ) = (20, 0, 100, 0). 

in the integrand of Mil to achieve a larger range of /i. But when the number of R* is small, ansatz J21> produces better results, 
probably due to its more convoluted form. 

Now we can control both the average and the variance of A by manipulating f2{i) and f^{i). Roughly speaking, /2(t) controls 
the average and /3(t) controls the variance. For the same parameter set shown in Fig. |3b), we did the computation by using 
ansatz Eq. M\\ and displayed the result in the same figure (solid line). It matches closely with the exact result, better than all 
other computations. 

To show the effectiveness of the ansatz Mil , we use it to do one more computation with g = 2,k = l,fi = 0.02 , A = 0.15 
and the initial condition {Nr, Nr, , Na, Na' ) = (20, 0, 100, 0). In Fig. the distributions of A* are displayed at i = 6 and 
t = 30. Although, the result from ansatz Mil is slightly narrower than the Gillespie computation, they match very well both at 
t = 6 and at i = 30. Actually, this is true for all times as can be seen in Fig. |3 where the time evolution of the average and 
the variance are depicted. In Fig. |3land|4] the average of A* from Langevin equation is always greater than the exact result as 
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explained before, although the variance is computed accurately. Curiously, the average from il-expansion is smaller initially but 
later grows larger than the exact one. We also plotted the computation results from Eq. ( I18t . It gives a smaller variance even 
though the average is quite accurate. In this case, the R* fluctuation is important. 



C. Application to a three-step amplifleation cascade 

It is not hard to write ansatze similar to Eq. J21> for longer or more complicated cascades. In this section, we demonstrate the 
use of the variational method for a 3-step cascade with and without feedback loop. In the next section, we will write the equation 
for a 4-step cascade. 

Assume that A* catalyzes a subsequent enzyme activation/deactivation reaction B ^ B* with a forward rate ^2 and a 
backward decay rate A2. The total number N2 of B and B* is a constant during the reaction. Following similar procedures as 
before, we found that the generating function ^'(a;, y, z) satisfies 

+ " y^^^'^'^y ~ + + ^("^ ^^"^ ' ^^^^ 

where the first term describes the B — B* reaction. The ansatz similar to Eq. (12 U reads 



<i>R{x,y,z)= J d,s^(l + /4(t)e-(^-/^(*»'(z-l)) ' (l + /2(t)e-(^-/-^(*»'(j/-l) + /i(t)(x-l)) , (24) 

where /4(t), fbit) describes the B — B* reaction. The calculation results from this ansatz are shown in Fig. |5ja),|6la) and 
Eta). 




(a) without feedback (b) with feedback 

FIG. 5: Comparison of the computed distributions for Nb* at f = 60 for the 3-step cascade without (a) and with (b) negative feedback: 
Gillespie simulation (circles), integral form basis (solid line), Langevin equation (dashed line), g = 0.2 ,k — 0.1 , ^ — 0.02 , A = 0.15 , /i2 = 
0.01 ,A2 = 0.07, ti3 = 0.01 with initial conditions {Nr, Nr* , Na, Na- , Nb, Nb-) = (20,0,20,0,30,0). 

Shown in Fig.|5ja) is the B* distribution computed from different methods. Ansatz ( I24> computation matches very well with 
the exact solution while the Langevin profile is shifted to the right. On the left boundary, both ansatz i24\ and Langevin equation 
approach zero while the exact solution has a finite value there. Interestingly, the variance shows a maximum value during the 
evolution as displayed in Fig. Qa). The computation from ansatz J24t captures this non-monotonous behavior accurately which 
is not obvious at all in the Langevin computation. 

Next, we consider a 3-step signaling cascade with a feedback loop. For example, we can imagine a reaction in which B* turns 
off the R* signaling, by catalyzing the R* ^ R decay at a rate 1^13 (Fig. |8j. Mathematically, this coiTesponds to adding an extra 
term 

Ai3(l - y)(iV2a*/9a; - zd^'^i' /dxdz) 
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(a) Without feedback 



(b) With feedback 



FIG. 6: Comparison of the Nb* average computed for the 3-step cascade without (a) and with (b) negative feedback in time interval t — [0, 60] : 
Gillespie simulation (circles), integral form basis (solid line), Langevin equation (dashed line), g = 0.2 , fc = 0.1 , ^ = 0.02 , A = 0.15 , /12 = 
0.01 , A2 = 0.07 , At3 = 0.01 with initial condition {Nr, Nr. , Na, Na' , Nb,Nb' ) = (20, 0, 20, 0, 30, 0). 
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(a) Without feedback 



(b) With feedback 



FIG. 7: Comparison of the Nb* variance computed for the 3-step cascade without (a) and with (b) negative feedback in time interval 
t = [0, 60]: Gillespie simulation (circles), integral form basis (solid line), Langevin equation (dashed line), g = 0.2 , k = 0.1 , fj, = 0.02 , A = 
0.15 , H2 = 0.01 , A2 = 0.07 , n-3 = 0.01 with initial condition {Nr, Nr> , Na, Na* , Nb,Nb' ) = (20, 0, 20, 0, 30, 0). 



to the right hand side of PDE J23> . We may still use the same right ansatz J24l i and the results are displayed in Fig. |5Ib),|5Jb) and 
EJb). Surprisingly, despite the time scale mixing and nonlinearity, the variational computation matches even better with the exact 
result than without feedback (compare |5] (a) and (b)). The relative shift of the average computed from the Langevin equation 
increases. The maximum in the variance still exists but its height decreases with the variance itself. In this case, it seems that 
the negative feedback sharpens the signal. 



D. Application to a four-step amplification cascade 

Our last demonstration of the variational method is concerned with a 4-step cascade. We append a further enzymatic reaction 
C ^ C* to our 3-step cascade without feedback. In this reaction, the protein C is switched on with a rate by B* and decays 
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FIG. 8: An inactive receptor R, when activated by a signal, activates downstream protein A, which in turn activates protein B. In a negative 
feedback loop, B downregulates the R activation. 



at a rate A3. Again, the total number N-j, of C and C* is a constant during the reaction. Routinely, we add the corresponding 
extra term 

(1 - w){^~li-?.z^ AsA^s + (A3W + M3A^2)t^)* 

ozow ow 

to the right hand side of Eq. ( I23> . The right ansatz is also postulated following the previous pattern, 

N3 ^ , N W: 

N 



'^R{x,y,z,w) = J (l + /6(i)e-(^-^^(*»' («;-!)) ' (l + A We-(^-^=(*»' (z - 1)) 

(1 + /2(i)e-(-^-/-^(*»'(y - 1) + /i(t)(.T - 1))"^ , (25) 



where feit), frit) describes the C — C* reaction. The computations for a particular set of parameters were carried out and the 
results are depicted in Fig. l^and[TUI 

0.1^ 




FIG. 9: Comparison of the computed distributions for Nc* at t = 100 for the 4-step cascade: Gillespie simulation (circles), integral form basis 
(solid line) and Langevin equation (dashed line), g = 0.2 , fc = 0.1 , = 0.02 , A = 0.15 , ^2 = 0.01 , A2 = 0.07 , ^3 = 0.005 , A3 = 0.05 
with initial condition {Nr, Nr* , Na, Na- , Nb,Nb' , Nc,Nc' ) = (20, 0, 20, 0, 30, 0, 50, 0). 

In Fig. |9] the distributions of C* at t = 100 from different calculations agree with each other very well. The variational 
profile is slightly narrower than the exact one but the averages overlap at all times (see Fig. llOt a)). Now, the maximum in the 
variance becomes more pronounced. Even the Langevin computation clearly displays this feature in Fig. llOr b) though its peak 
is considerably smaller than the exact one. 
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t t 

(a) The average (b) The variance 

FIG. 10: Comparison of the Nc* average and variance computed for the 4- step cascade in time interval t = [Q, 100]: Gillespie simulation 
(circles), integral form basis (solid line) and Langevin equation (dashed line), g = 0.2 ,k = 0.1 , p = 0.02 , A = 0.15 , ^2 = 0.01 , A2 = 
0.07,At3 = 0.005, A3 = 0.05 with initial condition {Nr, Nr* , Na, Na- , Nb, Nb- , Nc , Nc) = (20,0,20,0,30,0,50,0). 



IV. THE ADVANTAGES AND DRAWBACKS OF THE VARIATIONAL PRINCIPLE 

The enzymatic reaction cascades of various lengths considered in section UTI] are a common occurrence, for instance, in the 
MAPK family of signaling cascades^i^2i^2i2£. It is straightforward to extend the use of our variational scheme to more complex 
cases, to cascades or networks with complex topology. In general, the generating function is to be postulated in an integral form, 
as demonstrated earlier for the 2-, 3-, and 4-step cascades, with the time-dependent parameter functions being determined by a 
set of ODEs derived from the variational principle. Our scheme may be used to treat both the small and large particle number 
systems. It is many orders of magnitude computationally more efficient in computing the distributions compared with alternative 
numerical simulation techniques such as the Gillespie algorithm or Langevin equation. 

However, in the current form, the ansatz has a number of practical limitations, discussed next. It is difficult to represent effi- 
ciently distributions with multiple peaks, for example, or to directly compute transition rates between two deterministically stable 
states, a common scenario in a gene switch modeling. Another problem is that the derived set of ODEs is quite complicated, 
thus, symbolic algebra software is necessary to carry out the necessary manipulations. The method accuracy may also depend on 
the choice of the left ansatz. We chose the current left ansatz form from several trials for simplicity and efficiency. Sometimes 
when Eq. the time evolution of the unknown functions may result in a possible singularity, requiring workarounds. For 
reaction types other than the enzymatic one discussed here, such as the binding reactions, the current basis functions may not 
work properly, since the total particle number of one species (including both the activated and inactive ones) is required to be 
constant. This may not be true for some arbitrary reaction, necessitating development of new basis functions. However, this is 
straightforward, and the general principles and considerations that were discussed are expected to apply to those cases as well. 

In general, sufficient accuracy may be achieved with a large number of basis functions. The probability distributions are 
obtained when is expanded at a; = and the moments are obtained when $jj expanded at a; = 1. Therefore, together 
with its derivatives is to be well approximated in the whole interval x £ [0, 1]. However, the variation equations only 
consider the validity of Eq. (|2ji in the neighborhood of x = 1. It is difficult to estimate the error bounds of 'i> and especially 
its derivatives near a; = 0, though we know that it generally decreases with increasing accuracy at a; = 1. The choice of basis 
function, therefore, is essential for the variational technique to be successful. 

From the above considerations one may expect that the variational principle itself may still be improved. In Fig. \^ we show 
the distributions of A* calculated with different methods with a parameter set g ~ 0.4 , fc = 0.1 , /i = 0.02 , A = 0.15 and initial 
conditions (iVfl, Nn* , Na, Na' ) = (20, 0, 100, 0). In this case, the R—R* reaction is unusually slow compared with the A — A* 
reaction so that the large fluctuations in the first reaction are retained in the second one. To obtain a highly accurate solution in 
this parameter regime, a special convolution form was used to solve the generating function PDF in our previous work^S. The 
current variational scheme, however, underestimates the A* distribution variance. By manually adjusting the /2 , /s, we may 
obtain a much better fit (solid line in Fig. II H b)). demonstrating that the variational calculation does not necessarily provide an 
optimal solution. However, these results suggest that the present time-dependent basis sets are powerful enough to account for 
these extremely broad distributions. From the experience of numerical solution of ODEs and the conventional variational method 
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(a)P(AfA*) att = 30 (b) PiNA") at t = 30 



FIG. 11: Comparison of the computed distributions for at t = 30 for the 2-step cascade: Gillespie simulation (circles), integral form 
basis (solid line), 57-expansion (dotted line) and Langevin equation (dashed line), g = 0.4 , fc = 0.1, = 0.02 , A = 0.15 with initial 
condition {Nr, Nr> , Na,Na') = (20,0,100,0). (a)/2(t = 30) = 0.7304,/3(t = 30) = 0.1413 calculated by integration of Eq.fT9l 
(b)/2(t = 30) = 0.78 , f?,{t = 30) = 0.3 estimated by best fitting the exact solution. Shown in the picture is only the distribution profile on 
[0, 60] with other part close to zero. 

in quantum mechanics, a better variational strategy may be to consider simultaneously the validity of Eq. (|3 at all points on the 
interval [0, 1]. We are currently developing an improved variational approach to address some of the shortcomings discussed 
above. 



V. SUMMARY 

Cells live in a fluctuating environment in which signals and noise keep bombarding the cell receptorsiiiiS. Noisy signals 
propagate inside the cell via microscopic chemical reaction events. Cells have evolved to adapt to or even exploit the seemingly 
deleterious effect of fluctuations on signaling dynamics within a mesoscopic size object. Thus, it is important to develop a 
qualitative picture, based on mathematical modeling of stochastic chemical kinetics, of how signaling networks process noisy 
signals. In this paper, we applied a variational principle to the solution of the master equation which describes the noisy signal 
propagation. 

The essential difficulty associated with the master equation approach is the enormous number of ODEs involved. To compactly 
encode information, we use a QFT formulation in which the evolution of probability distributions is governed by one "quantum" 
wave equation. We have explicitly demonstrated the equivalence of the field theoretic formalism with the generating function 
approach, greatly facilitating the practical application of the variational technique proposed by EyinkS. We further examined 
the significance of the variational principle in this context. According to our previous investigation™, we suggest two novel 
classes of time-dependent basis functions: one is in simple algebraic form and another is in an integral convolution. These basis 
functions are key to the successful application of the variational method to various signaling pathways. We applied the new basis 
functions to describe stochastic signaling in 2-step, 3-step and 4-step enzymatic cascades and compared the obtained results 
with alternative solution techniques. The variational scheme presented here works favorably in a large parameter range. It treats 
effectively both the small and the large particle numbers, and is orders of magnitude faster to compute compared with various 
Monte Carlo simulation algorithms. 

However, the current scheme has also some limitations. The resulting evolution equations may be complicated and their 
derivation requires considerable symbolic manipulation, somewhat ameliorated by using modern computer algebra software. 
We also showed that the variational principle itself in this context is not the most optimal. Despite these shortcomings, the 
present variational approach may already be profitably applied to various signal transduction pathways, allowing one to obtain 
quantitative and semiquantitative solution to stochastic signaling dynamics in a broad range of parameters. The technique may 
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be further improved to extend its limits of applicability, which is a work in progress. 
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